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Abstract 

Background: Citrus tristeza virus (CTV), a member of the genus Closterovirus within the family Closteroviridae, is the 
causal agent of citrus tristeza disease. Previous studies revealed that the negative selection, RNA recombination and 
gene flow were the most important forces that drove CTV evolution. However, the CTV codon usage was not 
studied and thus its role in CTV evolution remains unknown. 

Results: A detailed comparative analysis of CTV codon usage pattern was done in this study. Results of the study 
show that although in general CTV does not have a high degree of codon usage bias, the codon usage of CTV has 
a high level of resemblance to its host codon usage. In addition, our data indicate that the codon usage 
resemblance is only observed for the woody plant-infecting closteroviruses but not the closteroviruses infecting the 
herbaceous host plants, suggesting the existence of different virus-host interactions between the herbaceous 
plant-infecting and woody plant-infecting closteroviruses. 

Conclusion: Based on the results, we suggest that in addition to RNA recombination, negative selection and gene 
flow, host plant codon usage selection can also affect CTV evolution. 

Keywords: Citrus tristeza virus, Synonymous codon usage, Citrus sinensis, Codon resemblance, Virus-host 
interaction 



Background 

Protein synthesis takes place when genetic codes stored 
in the genome is translated at ribosomes in a three- 
nucleotide manner from the 5' to the 3' end. Each three- 
nucleotides represents a unique genetic codon for an 
amino acid or as a translation stop codon. There are 64 
codons for the 20 standard amino acids and three stop 
codons, resulting in more than one codon for most of 
the 20 amino acids. Codons encode the same amino acid 
are known as synonymous codons. The synonymous 
codons are not used in the same frequency in different 
genes or organisms, indicating the existence of biases in 
codon usage [1]. Bias in codon usage may play an im- 
portant role in evolution history of genes or organisms 
[2]. It was reported that the codon usage bias can be 
influenced by many factors including translation selec- 
tion, mutation pressure, gene transfer, amino acid con- 
servation, RNA stability, hypersaline adaption and 
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growth conditions [3-5]. Among these factors, mutation 
pressure and translation selection were thought to be 
the key factors shaping the codon usage bias [6]. 

Viruses are obligate intracellular parasites which 
dependent on host cells for their genome replication and 
protein synthesis. It was reported that viral codon usage 
bias is determined by both virus itself and its host. Similar 
to other organisms, both mutation pressure and translation 
selection play a key role in shaping viral codon usage bias 
[7-10]. Other factors that affect viral codon usage bias in- 
clude fine-tuning translation kinetic selection [11,12], 
codon pair bias [13], and escape from cellular antiviral 
responses through a mechanism involving reduction of 
CpG dinucleotide [14], Studies of viral codon usage bias 
can improve our knowledge not only on virus evolution but 
also specific interactions between a virus and its host. The 
codon usage pattern of animal viruses, including human 
immunodeficiency virus type 1 and hepatitis A virus, has 
been studied extensively [11,15-19]. For plant viruses this 
type of study is still rare [8,20,21], 

Citrus tristeza virus (CTV), the causal agent of citrus 
tristeza disease, is a notorious plant RNA virus. CTV 



© 2012 Cheng et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative 
BiolVlGCl C6ntTcll Commons Attribution License (http://creativecommons.Org/licenses/by/2.0), which permits unrestricted use, distribution, and 
reproduction in any medium, provided the original work is properly cited. 



Cheng et al. Virology Journal 201 2, 9:1 1 3 
http://www.virologyj.eom/content/9/1/1 1 3 



Page 2 of 9 



causes tremendous economic losses to the citrus indus- 
tries worldwide [22]. CTV is a non-enveloped, single- 
stranded positive-sense RNA virus belonging to the 
genus Closterovirus in the family of Closteroviridae [23]. 
Genome RNA (gRNA) of CTV is approximately 19.3 kb 
in length and contains 12 open reading frames (ORFs) 
that from the 5' to the 3' end are ORFla, ORFlb, p33, 
p6, p65, p61> p27 (encodes the minor coat protein), p2S 
(encodes the major coat protein), pl8, pl3, p20, and 
p23. The 12 ORFs are finally translated into at least 19 
different proteins [24]. ORFla and ORFlb are translated 
directly from the gRNA and encode proteins that are 
required for CTV replication. The ORFs, present on 3'- 
coterminal subgenomic RNAs, encode proteins that are 
necessary for CTV replication (e.g. p65 and p61), virion 
assembly (p65, p61, p27 and p25) [25], virus movement 
(p65, p61, p6, p20) [26], symptom development and 
asymmetrical accumulation of positive and negative 
strand viral RNAs during CTV infection (p23) [27-29], 
and suppression of RNA silencing (p25, p20 and p23) 
[30]. Functions of CTV p33, pi 8 and pl3 proteins have 
not been determined. 

Isolates of CTV can cause different disease symptoms 
(i.e. yellowing canopies, declining and stunting of trees, 
and stem pitting) on different indicator citrus plants, 



indicating the existence of a highly diversified genetic 
population of CTV in nature [31]. Previous phylogenetic 
and genetic marker analyses showed that CTV is con- 
sists of several genetically distinct genotypes [32,33]. 
Previous studies also showed that RNA recombination, 
negative selection and gene flow are the important 
forces that drive evolution of CTV [34-38]. However, the 
contribution of codon usage bias to CTV evolution 
remains unclear. In this study, a detailed comparative 
analysis was performed using the coding regions for all 
CTV proteins (refer to full coding region thereafter) to 
determine the CTV codon usage pattern. Our results 
show that CTV has a high level of codon usage resem- 
blance to its citrus host, suggesting that codon usage 
adaptation may also have an important role during CTV 
evolution. 

Results 

Nucleotide composition properties of CTV full coding 
region 

The effective number of codons (N c ) of the 20 selected 
CTV isolates was determined to generate an overall view 
of the codon usage patterns. Table 1 shows that the N c 
values of the 20 selected CTV isolates varied from 51.9 
to 54.8, with an average value of 53.0 ±0.6641. This 



Table 1 Nucleotide contents of CTV 



Isolate 
numbers 


A% 


A 3 % 


U% 


U 3 % 


C% 


C 3 % 


G% 


G 3 % 


(G + C) % 


(G + C) 3 % 


N c 


1 


26.4 


20.3 


29.9 


36.3 


17.2 


21.6 


25.0 


22.1 


42.2 


43.8 


53.0 


2 


27.0 


22.3 


29.9 


36.3 


17.2 


21.9 


24.4 


20.0 


41.6 


41.8 


53.8 


3 


26.8 


21.7 


30.1 


36.5 


17.0 


21.6 


24.7 


20.6 


41.7 


42.2 


52.2 


4 


26.5 


20.4 


29.9 


36.3 


17.1 


21.8 


24.9 


21.9 


42.1 


43.7 


52.9 


5 


26.6 


20.8 


30.0 


36.7 


17.1 


21.4 


24.9 


21.5 


41.9 


42.9 


52.7 


6 


26.7 


21.0 


29.9 


36.2 


17.2 


22.1 


24.7 


21.2 


41.9 


43.2 


54.2 


7 


26.6 


20.4 


30.0 


36.5 


17.3 


21.9 


24.6 


21.6 


41.9 


43.5 


53.0 


8 


26.6 


20.4 


30.1 


36.7 


17.2 


21.8 


24.6 


21.6 


41.8 


43.4 


53.0 


9 


26.8 


21.0 


30.2 


37.1 


16.9 


21.1 


24.6 


21.2 


41.6 


42.3 


52.5 


10 


26.8 


21.2 


30.1 


36.9 


17.1 


21.0 


24.6 


21.2 


41.7 


42.3 


52.6 


11 


26.7 


21.0 


30.0 


36.5 


17.0 


21.6 


24.7 


21.4 


41.8 


42.9 


52.9 


12 


26.9 


22.1 


30.2 


36.8 


17.0 


21.6 


24.3 


19.9 


41.4 


41.5 


51.9 


13 


26.6 


20.8 


30.1 


36.6 


17.3 


22.1 


24.5 


20.9 


41.8 


43.0 


54.8 


14 


26.5 


20.9 


30.3 


36.9 


17.2 


21.7 


24.5 


20.9 


41.7 


42.7 


53.6 


15 


26.4 


20.9 


30.3 


36.9 


17.1 


21.7 


24.6 


20.9 


41.7 


42.6 


53.5 


16 


26.5 


20.8 


30.3 


36.8 


17.2 


21.9 


24.5 


20.9 


41.7 


42.8 


53.6 


17 


26.5 


21.0 


30.4 


37.1 


17.0 


21.5 


24.5 


20.8 


41.5 


42.3 


53.4 


18 


26.4 


20.7 


30.2 


36.4 


17.4 


22.4 


24.4 


20.8 


41.8 


43.2 


52.8 


19 


26.5 


20.7 


29.9 


35.9 


17.4 


22.7 


24.7 


21.1 


42.1 


43.8 


52.6 


20 


26.5 


20.7 


30.0 


36.0 


17.4 


22.7 


24.6 


21.1 


42.0 


43.8 


52.6 


Average 


26.6 


20.9 


30.1 


36.5 


17.2 


21.9 


24.6 


21.1 


41.8 


43.0 


53.0 
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fining suggests that CTV does not possess an excessive 
overall codon usage bias and the variation of codon 
usage bias among CTV isolates is small. 

The nucleotide abundance was then calculated as an- 
other indicator of codon usage bias for CTV (Table 1). 
The overall Guanine and Cytimidine (G + C) contents in 
the CTV full coding region and at the synonymous sites 
(G + C) 3 fluctuate ranging from 41.4 to 42.2% with an 
average at 41.8 ±0.21 and from 41.5 to 43.8% with an 
average at 43.0 ±0.68, respectively (Table 1). These 
results indicate that variation of (G + C) content among 
CTV isolates in the full coding region and at synonym- 
ous sites is small. Comparing the A, U, G and C con- 
tents at the synonymous sites (abbreviated as A 3 , U 3 , G 3 
and C 3 ), it is clear that the U 3 value is the highest, ran- 
ging from 35.9 to 37.1% with an average at 36.5 ±0.36. 
Thus the major codons used by CTV are U-ended. Fur- 
ther comparison of the U, C, G and A contents with the 
U 3 C 3 , G 3 , and A 3 contents indicated that the U and C 
contents were significantly enriched at the synonymous 
sites, whereas the G and A were significantly decreased 
at these synonymous sites (t test, P< 0.001). To generate 
a visual display of the main features of codon usage pat- 
tern as reported previously by Wright [39], we per- 
formed the N c -plot, a plot showing N c vs. (G + C) 3 . In 
this N c -plot (Figure 1), all the CTV isolates clustered to- 
gether and deviated slightly from the expected curve, 
which represents the expected codon usage when G + C 
compositional constraints alone account for the codon 
usage bias [39]. Our finding implies that CTV is sub- 
jected to G + C compositional constraints. 

To further confirm this conclusion, we analyzed the 
cumulative relative synonymous codon usage (RSCU) 
values for the 20 selected CTV isolates with a total 




0 20 40 60 80 100 

(G+C) 3 Content (%of CTV isolates) 

Figure 1 N c -plot of N c values versus (G + C) 3 contents of CTV 
isolates. Blue curve indicates the expected curve when all codons 
are used randomly (no selection) and is calculated using the formula 
reported by Wright previously [39]. 



number of 123,535 synonymous codons (Table 2). For 
amino acids (except Leu) that have more than two syn- 
onymous codons (e.g. Val, Ser, Pro, Thr, Gly, Arg, Ala 
and He), the codons with the highest RSCU values are 
all ended with U. For amino acids that have two syn- 
onymous codons and are ended with U or C (e.g. Phe, 
His, Asn, Asp, Cys and Tyr), only Tyr displayed a weak 
preference to codons ended with C (UAC). The RSCU 

Table 2 Relative synonymous codon usage (RSCU) values 
in the full coding region of CTV and Citrus sinensis 



AA a 


Codon 


N b 


CTV C 


cs d 


AA 


Codon 


N 


CTV 


cs 


Phe 


uuu 


4888 


1.26 e 


1.05 


Gin 


CAA 


1603 


1.24 


1.06 




uuc 


2888 


0.74 


0.95 




CAG 


986 


0.76 


0.94 


Leu 


UUA 


3748 


1.57 


0.77 


His 


CAU 


1535 


1.03 


1.08 




UUG 


4825 


2.02 


1.40 




CAC 


1453 


0.97 


0.92 




CUU 


2149 


0.90 


1.58 


Asn 


AAU 


2997 


1.00 


1.07 




cue 


1216 


0.51 


0.91 




AAC 


2992 


1.00 


0.93 




CUA 


1053 


0.44 


0.53 


Lys 


AAA 


3604 


0.94 


0.86 




CUG 


1354 


0.57 


0.80 




AAG 


4077 


1.06 


1.14 


Val 


GUU 


5171 


1.52 


1.61 


Asp 


GAU 


4988 


1.10 


1.29 




GUC 


2302 


0.68 


0.67 




GAC 


4101 


0.90 


0.71 




GUA 


1861 


0.55 


0.48 


Glu 


GAA 


4188 


1.12 


0.95 




GUG 


4277 


1.26 


1.24 




GAG 


3262 


0.88 


1.05 


Ser 


UCU 


3742 


1.50 


1.38 


Arg 


AGA 


2080 


1.13 


1.82 




UCC 


2014 


0.81 


0.77 




AGG 


2032 


1.11 


1.82 




UCA 


1642 


0.66 


1.33 




CGU 


2814 


1.53 


0.68 




UCG 


3125 


1.25 


0.71 




CGC 


1599 


0.87 


0.56 




AGU 


2791 


1.12 


0.87 




CGA 


1343 


0.73 


0.58 




AGC 


1642 


0.66 


0.93 




CGG 


1136 


0.62 


0.54 


Pro 


ecu 


2261 


1.62 


1.31 


Cys 


UGU 


2235 


1.24 


0.98 




CCC 


984 


0.71 


0.87 




UGC 


1356 


0.76 


1.02 




CCA 


845 


0.61 


1.25 


Tyr 


UAU 


2500 


0.89 


1.05 




CCG 


1476 


1.06 


0.57 




UAC 


3107 


1.11 


0.95 


Thr 


ACU 


3455 


1.74 


1.45 


Ala 


GCU 


3824 


1.68 


1.58 




ACC 


1676 


0.84 


0.83 




GCC 


1558 


0.68 


0.86 




ACA 


891 


0.45 


1.18 




GCA 


1373 


0.60 


1.11 




ACG 


1933 


0.97 


0.53 




GCG 


2373 


1.04 


0.45 


Gly 


GGU 


4337 


1.99 


1.13 


He 


AUU 


2562 


1.21 


1.37 




GGC 


1311 


0.60 


0.99 




AUC 


1681 


0.79 


0.93 




GGA 


1329 


0.61 


1.07 




AUA 


2135 


1.00 


0.70 




GGG 


1759 


0.81 


0.81 













a AA is the abbreviation of amino acid. 

b N, the total numbers for each codon used by the 20 CTV isolates. 

C CTV, the mean RSCU values of CTV. 

d CS, the mean RSCU values of Citrus sinensis. 

e The Preferred codons are under lined. A preferred codon is defined by the 
codon with the highest RSCU value among all available synonymous codons 
for a certain amino acid. However a codon with the highest RSCU value but 
lower than 1.1 cannot be defined as the preferred codon, since this value is 
statistically insignificantly under 95% confidence interval. 
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values for amino acids that have two synonymous 
codons and are ended with A or G (e.g. Gin, Lys and 
Glu) are similar, indicating that a similar codon usage 
frequency (Table 2). These results demonstrate that 
CTV likely prefers a U-ended codon usage. 

Codon usage patterns of CTV and its host, citrus sinensis 

To compare the codon usage patterns of CTV and its 
host, we downloaded the codon usage pattern of C. 
sinensis from the Codon Usage Database (http://www 
kazusa.or.jp/codon/). Interestingly, our analysis shows 
that most of the C. sinensis preferred codons are also 
U-ended (Table 2). We then calculated the codon nu- 
cleotide abundance for C. sinensis and compared it with 
that of CTV. It was reported previously that for syn- 
onymous codons, the second nucleotide site has the 
strongest constraint, followed by the first nucleotide site 
[40]. As shown in Figure 2A, CTV has a almost identical 
nucleotide abundance at the second nucleotide site com- 
pared with that of C. sinensis. At the first nucleotide site, 
a similar trend is also evident with slight variations be- 
tween the two species. At the third nucleotide site, how- 
ever, both CTV and C. sinensis showed a high content of 
U, indicating that U is preferred by both CTV and C. 
sinensis at the synonymous sites. Interestingly, the sec- 
ond abundant nucleotide at the synonymous sites for C. 
sinensis is C, which is found to be over-represented at 
the CTV synonymous sites (Table 1). Furthermore, the 
observed codon usage frequencies for CTV is highly cor- 
related with that for C. sinensis {R = 0.826, P<0.01) 
(Figure 2B), indicating that the codon usage of CTV has 
a high level of resemblance to that of C. sinensis. 

Codon usage variations among CTV genotypes 

CTV is known to have several distinct biological geno- 
types [31-33]. To determine the codon usage variations 
for these CTV genotypes, a phylogenetic tree was con- 
structed using the full coding region of CTV. Similar to 
the phylogenetic tree constructed using the CTV full 
length genomic sequences [33], the yellowing and stem 
pitting isolates were clustered in the same group 
(group 1), the quick declining isolates were clustered in 
the group2, and isolates that are capable of breaking 
CTV resistance in trifoliate orange (Poncirus trifoliata) 
were clustered in the group3 (Figure 3A). To determine 
the variation of codon usage among the CTV genotypes 
we conducted a correspondence analysis (CO A), a 
method used to detect major trends in codon usage var- 
iations between genes or organisms [41], based on the 
RSCU values from the 20 selected CTV isolates. Results 
of the CO A extract two major axes. The Axis 1 can ex- 
plain 37.98% and the Axis 2 can account for 17.18% of 
the total variations observed. A plot of the two major 
axes was shown in Figure 3B. In the plot, the three 
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Figure 2 Comparative analysis of nucleotide composition and 
codon abundance of CTV and G sinensis. (A) Frequencies of 4 
nucleotides at the three positions within a codon for the full CTV 
coding region (up panel) and coding region of C. sinensis (lower 
panel). (B) Correlation of the codon abundances between CIA/ and 
C. sinensis. 



phylogenetic distinct groups are clustered in three inde- 
pendent fields, indicating that these three CTV groups 
have different trends in codon usage. 

A correlation analysis was performed using the nucleo- 
tide compositions at the synonymous sites and the two 
major axes obtained from the CO A analysis (Table 3). 
This analysis allows us to identify the contents that are 
responsible for the variations [19,42]. Results of the ana- 
lysis show that only C 3 has a clear correlation with the 
two major axes. This indicates that although U is the 
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Figure 3 Codon usage variations of CTV genotypes. (A) NJ phylogenetic tree of CW constructed using the entire coding region. The 
yellowing and stem-pitting CT\/ group is colored in blue, the trifoliate orange resistance breaking CTV group is colored in purple, and the quick 
declining CT\/ group is colored in yellow. (B) Distributions of CW codon usage variation along the first and second axes based on the COA 
analysis. Coordinates of the three CWf groups are colored in the same way as that shown in Figure 3A. 



most preferred nucleotide at the synonymous sites the 
codon usage variations found among the CTV genotypes 
were determined by the content of C at the synonymous 
sites. 

Codon usage adaptation of closteroviruses 

The high degree of CTV codon usage adaptation to its 
host suggests that the adaptation may be a common 
phenomenon between closteroviruses and their hosts. 
To confirm this hypothesis, the full length genome 
sequences of beet yellows virus (AF056575, BYV), carrot 
yellow leaf virus (NC_0 13007, CYLV), grapevine root- 
stock stem lesion associated virus (NC_004724, 
GRSLaV) and grapevine leafroll-associated virus 2 
(NC_007448, GRSLaV-2) were downloaded from the 
GenBank. The empirical codon frequency of each virus 
was calculated and compared with that of its host plant: 
Beta vulgaris (beet) for BYV, Daucus carrot (carrot) for 
CYLV, and Vitis vinifera (grapevine) for GRSLaV and 
GRSLaV-2. Results shown in Figure 4 indicate that sig- 
nificant correlation (P<0.01) is observed between 
grapevine and its two viruses (GRSLaV and GRSLaV-2) 
but not between beet and BYV or carrot and CYLV. This 



Table 3 Analysis of correlation between the first two 
principle axes and nucleotide compositions 


Nucleotide contents 


Axis 1 


Axis 2 


A 3 


-0.109 


-0.380 


u 3 


0.022 


-0.589** 


G 3 


-0.356 


0.299 


c 3 


0.539** 


0.504* 



^Correlation is significant at the 0.01 level (2-tailed). 
^Correlation is significant at the 0.05 level (2-tailed). 



finding shows that codon usage adaptation to a host is 
not a common phenomenon of closteroviruses. It occurs 
only in some closteroviruses. 

Discussion 

In this study, a detailed comparative analysis was done 
to determine CTV codon usage bias. Our results show 
that in general CTV does not have a high degree of 
codon usage bias (average N c = 53.0, Table 1), and muta- 
tional bias is likely to be the major force that drives 
CTV codon usage bias (Figure 1). This finding supports 
the previous reports that mutational bias is the major 
force that affects the viral codon usage in other viruses 
[7,8]. However, the deviation of the coordinates from the 
expected curve shown in the N c -plot cannot be simply 
explained by the mutational bias as suggested by Wright 
previously [39]. It is possible that this deviation is caused 
by either the G/C-biased mutation pressure or the nega- 
tive/positive selection of codons ended with C and/or G 
as described before [39]. In deed, comparing the A, U, 
G, and C contents in the full coding region with that 
found at the synonymous codon sites, C is over- 
represented at the synonymous codon sites in addition 
to U (Table 1). Interestingly, analysis of selective pres- 
sure that act on different codons suggested that the full 
coding region of CTV is subjected mostly to the purify- 
ing selection described by Martin et al. [35]. It is pos- 
sible that the enrichment of C at the synonymous sites is 
caused by negative selection other than the C biased 
mutational pressure. Furthermore, results of COA show 
that the C content at the synonymous sites is the major 
factor that determines the codon usage variation among 
the CTV genotypes (Table 3). Because different CTV 
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Figure 4 Correlations of the codon abundances of closteroviruses and their respective host species. (A) Beet yellows virus (BYV) versus 
Beta vulgaris (beet); (B) Carrot yellow leaf virus (CYLV) versus Daucus carrot (carrot); (C) Grapevine rootstock stem lesion associated virus (GRSLaV) 
versus Vitis vinifera (grapevine); (D) Grapevine leafroll-associated virus 2 (GRLaV-2) versus V. vinifera. The codon usage patterns of beet, carrot and 
grapevine were downloaded from the Codon Usage Database (http://www.kazusa.or.jp/codon/). 



genotypes were reported to have different host origins 
[43], the enrichment of C at the synonymous sites is 
likely caused by the selection of the host. 

Our results also show that codon usage of CTV has a 
high level of resemblance to that of its citrus host. This 
is because i) both CTV and citrus have significantly 
higher content of U at the synonymous codon sites; ii) 
most of the preferred codons in CTV and citrus are the 
same; iii) a high correlation exists in codon frequencies 
between CTV and citrus. This result is understandable 
when consider the specific relationship between CTV 
and its host. CTV is restricted to citrus and it is gener- 
ally accepted that the virus co-evolved with the host spe- 
cies [44]. Whereas, citrus is a woody plant and can grow 
in field for hundreds of years [22]. After successful infec- 
tion, the virus can survive in this host for a very long 
period of time. This long term infection gives CTV an 
opportunity to select and adapt optional codons gener- 
ated during virus replication. As discussed above, the C 3 
content is the major factor that determines the codon 
usage variation among the CTV genotypes. Our data 
also indicate that the degrees of codon usage adaptations 
by different CTV genotypes to C. sinensis are different, 
suggesting that the codon usage variation may reflect 



specific interactions between the CTV genotypes and 
their original hosts. Because detailed genetic information 
on CTV original citrus hosts are missing, we are unable 
confirm the codon usage adaptation by CTV genotypes 
to their respective hosts. Nevertheless, our results pre- 
sented in this paper show that CTV and citrus is an idea 
model for studies of virus and host coevolution. 

Bahir et al. suggested previously that adaptation of codon 
usage varied among different viral genes and the highest 
degree of adaptation was observed for genes that expressed 
to high levels in cells, such as the viral CP [21]. In this 
study we also tried to analysis the variations of codon usage 
among CTV genes, and the different host effects on these 
genes. However, this attempt was un-succeeded because 
the number of codons used by some CTV genes are lim- 
ited and thus many synonymous codons may not be 
observed. This may cause artificial errors when compare 
virus codon usage frequency with that of its host. 

High adaptation of codon usage was previously reported 
for several viruses including those belonging to the family 
Flaviviridae, and bacteria-infecting and human viruses 
[14,21], We proposed that high codon adaptation pheno- 
menon might exist in all viruses in the genus Closterovirus 
since the codon usage patterns of different closteroviruses 
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are highly resemblance to each other (data not shown). 
However, our results show that the high degree of codon 
resemblance is only observed between the woody plant- 
infecting enteroviruses and their woody hosts, but not the 
herbaceous plants-infecting closteroviruses and their herb- 
aceous hosts (Figure 4). This difference may be caused par- 
tially by the different longevity of closteroviruses in their 
infected herbaceous or woody plants. It is known that the 
woody plant-infecting closteroviruses can exist in their host 
plants for a very long period of time. In addition, all woody 
plant-infecting closteroviruses infect only a few closely 
related species within the same genus. This narrow host 
range feature may also have a role in this unusual high 
codon adaptation phenomenon. For example, the natural 
hosts of CTV are limited only to a few species within the 
genus of Citrus [22]. 

Conclusion 

A detailed comparative analysis of CTV codon usage pat- 
tern was performed in this study. Results of the study show 
that the overall codon usage of CTV is highly resemble 
that of its host, C. sinensis. Our results also show that the 
codon usage resemblance is only observed for the woody 
plant-infecting closteroviruses but not the closteroviruses 
infecting the herbaceous host plants. This observation 
implies the existence of different virus-host interactions 
between the herbaceous plant-infecting and woody plant- 
infecting closteroviruses. In conclusion, our results indicate 
that in addition to RNA recombination, negative selection 
and gene flow, host codon usage selection can also have an 
important role in CTV evolution. 

Materials and methods 

Source of sequence data 

Full length genome sequences of CTV, BYV, CYLV, 
GRSLaV, and GRSLaV-2 were downloaded from the 
GenBank (http://www.ncbi.nlm.nih.gov/). To establish a 
sequence data set for CTV, isolates share less than 98% 
sequence identity were downloaded and the final data 
set consists of 20 CTV isolates (Table 4). The accession 
numbers and other information on these isolates are 
listed in Table 4. For codon usage analysis open reading 
frames (ORFs) with less than 150 nucleotides were 
excluded as described before [45]. 

The codon usage pattern of C. sinensis, B. vulgaris, D. 
carrot, and V. vinifera were downloaded from the Con- 
don Usage Database (http://www.kazusa.or.jp/codon/), 
which were tabulated based on all available sequences in 
the international DNA sequence databases [46]. 

Phylogenetic analysis 

Phylogenetic tree was constructed using the Neighbor- 
joining (NJ) method described in the MEGA 5.0 soft- 
ware [47]. The nucleotide substitution model, mutation 



Table 4 The information of 20 CTV isolates used in this 
study 

Isolate Strain length Biological Accession No. 

numbers name (nt) a property 

1 B165 18585 YSP b EU076703 

2 kpg3 18555 YSP HM573451 

3 HA16-5 18567 YSP GQ454870 

4 NZ-B18 18498 YSP FJ525436 

5 SP 18498 YSP EU857538 

6 T318A 18576 YSP DQ151548 

7 T30 18495 YSP AF260651 

8 T385 18495 YSP Y18420 

9 \^-FS2-2 18549 YSP EU937519 

10 VT-lsrael 18474 YSP U56902 

11 Nuaga 18549 YSP AB046398 

12 HA18-9 18549 RB C GQ454869 

13 NZRB-G90 18498 RB FJ525432 

14 NZRB-TH28 18498 RB FJ525433 

15 NZRB-TH30 18513 RB FJ525434 

16 NZRB-M12 18498 RB FJ525431 

17 NZRB-M1 7 18516 RB FJ525435 

18 Mexico 18516 QD d DQ272579 

19 Qaha 18588 QD AY340974 

20 T36 18588 QD NC_001661 

a non-coding regions were excluded. 
b YSP, yellowing and stem pitting. 
C RB, resistance breaking. 
d QD, quick declining. 

rate and mutation pattern were determined using the 
Model Selection Function described also in the MEGA 
5.0 software. The Bootstrapped confidence interval is 
based on 1000 replicates. 

Composition analysis of full coding regions of CTV 
isolates 

Analysis of compositional properties of all CTV ORFs, in- 
cluding (G + C), (G + C) 3 , A 3 , U 3 , G 3 and C 3 , was performed 
using the CodonW version 1.4.2 (John Peden, available at 
http://codonw.sourceforge.net/index.html). The nucleotide 
contents at the first and second codon positions were cal- 
culated as described by Wang et al. previously [48]. 

Measurement of effective number of codons 

Effective number of codons (N c ) has been used as a 
measurement for synonymous codon usage bias in genes 
and is considered to be independent of the gene length 
and amino acid composition [39]. The N c value ranging 
from 20 to 61 is often used to determine the degree of 
codon usage bias in a gene [39]. For example, a gene 
with a N c value at or below 35 is considered to have a 
strong codon usage bias, whereas a gene with a N c value 
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of 61 indicates that all available codons are used equally 
[39]. In this study the N c values were calculated using 
the CodonW version 1.4.2. 

Measurement of relative synonymous codon usage 
(RSCU) 

RSCU value is the ratio of observed to expected fre- 
quency of a codon and reflects the bias of synonymous 
codon usage without the influence of amino acid com- 
position and the abundance of synonymous codons [49]. 
A RSCU value above 1.0 indicates a positive codon 
usage bias, a value below 1.0 implies a negative codon 
usage bias, and a value at 1.0 indicates no codon usage 
bias for the synonymous codons [49]. In this study the 
RSCU value is calculated using the General Codon 
Usage Analysis (GCUA) software available at http:// 
bioinf.may.ie/GCUA/calculatecodon.html [50]. 

Correspondence analysis (COA) of synonymous codon 
usage 

COA is a commonly used multivariate statistical analysis 
method [51] and has been used to investigate the major 
trends in codon usage variation between genes or organisms 
[19,41,42]. In this study, COA is used to analyze codon usage 
variations between CTV isolates. In the analysis, the RSCU 
values of synonymous codons (excluding Met, Trp and the 
three termination codons) were treated as 59 dimensional 
vectors. Therefore, each CTV isolate can be represented by a 
59 coordinates (RSCU values). The calculation was done 
using the CodonW 1.4.2 software. 

Correlation analysis 

Correlation analysis was performed to determine the rela- 
tionship between nucleotide composition and synonymous 
codon usage pattern using the Spearman's rank correlation 
analysis described in the SPSS 16.0 software (SPSS Lnc, 
USA). 
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